Grouping and Categorization of Documents in Relativity Measure

نویسندگان

  • V. Asaithambi
  • D. John Aravindhar
  • V. Dheepa
چکیده

This paper presents a spectral clustering method called correlation through preserving indexing (CPI), which is to perform in the correlation similarity measure space. The documents are considered into a low dimensional semantic space, the correlations between the documents in the local patches are maximized and correlations between the documents outside these patches are minimized. The intrinsic structure of the document space is included in the similarities between the documents. Correlation is the similarity measure for finding the intrinsic structure of the document space than Euclidean distance. Simultaneously, the proposed CPI methods can effectively finding the intrinsic structures included in high-dimensional document space. The effectiveness of the new method is implemented by extensive experiments conducted on various data sets and by comparison with existing document clustering methods. Key Terms: Document Clustering, Correlation Latent Semantic Indexing, Dimensionality Reduction, Correlation Measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

General and Professional Qualifications of Iranian High School Masters according to the Ministerial Documents

General and Professional Qualifications of Iranian High School Masters according to the Ministerial Documents H. Abdollaahi, Ph.D. A perusal of the ministerial documents at the Iranian Ministry of Education during the past 100 years was undertaken in order to clarify how the general and professional qualifications required for the post of high school master have evolved over the c...

متن کامل

Concept Based Categorization of Documents for Search Engines

Now days, information retrieval is a challenging work for search engines. In this paper we will discuss about text categorization. Text documents categorization is the process to classify documents according to some predefined knowledge. Documents with same concept are grouped together, and documents with different concept are formed other group according to their similarity of context of the d...

متن کامل

Arabic Text Categorization Algorithm using Vector Evaluation Method

Text categorization is the process of grouping documents into categories based on their contents. This process is important to make information retrieval easier, and it became more important due to the huge textual information available online. The main problem in text categorization is how to improve the classification accuracy. Although Arabic text categorization is a new promising field, the...

متن کامل

Study of buffer effects on the grouping efficacy measure of stochastic cell formation problem

This paper deals the stochastic cell formation problem (SCFP). The paper presents a new nonlinear integer programming model for the SCFP in which the effect of buffer size on the grouping efficacy of cells has been investigated. The objective function is the maximization of the grouping efficacy of cells. A chance constraint is applied to explore the effect of buffer on the SCFP. Processing tim...

متن کامل

New Methods for Text Categorization Based on a New Feature Selection Method and a New Similarity Measure Between Documents

In this paper, we present a new feature selection method based on document frequencies and statistical values. We also present a new similarity measure to calculate the degree of similarity between documents. Based on the proposed feature selection method and the proposed similarity measure between documents, we present three methods for dealing with the Reuters-21578 top 10 categories text cat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013